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INTRODUCTION 

Carlson and Cohen [!] suggest that ’’the perfect 
image is one that, looks like a piece of (he world 
viewed through a picture frame.” They propose 
that the metric for the perfect image be die 
discriminability of the reconstructed image from 
the ideal image die reconstruction is meant to 
represent. If these two images, the ideal and the 
reconstruction, are noticeably different, then (he 
reconstruction is less than perfect. If they cannot 
he discriminated then the reconstructed image is 
perfect. This definition has the advantage that it 
con be used to define ’’good enough” image 
quality, An image that fully satisfies a task's 
image quality requirements, for example (ext 
legibility, is selected to be the standard. 
Rendered imagC6 are dicn compared to the 
standard. Rendered images that are 
indiscriminate from the standard are "good 
enough," Test patterns and test image set s serve 
as standards for many tasks and arc commonplace 
to the image communications and display 
industries, so this Is not a new nor novel idea. 

What docs it take to satisfy this definition? How 
much information is required? The answer 
depends upon the reconstruction device and the 
observer's human visual system. Which of these 
two elements, the device or the observer, 
dominates the outcome depends upon many 
factors. The obvious factors are lighting and 
viewing distance for the observer and resolution 
(i.c. temporal, spatial, and chromatic) for the 
device. Viewed in the dark, all images look die 
some[2], so the human visual system dominates 
when the lighting is extreme. Similarly, all 
images look the same when viewed from a 
sufficient distance; here again the human visual 
sysicm dominates. For any (wo renderings of the 
same image, there will always be a viewing 
distance and lighting condition pair at which the 
two renderings are indiscriminate. This distance 
defines the dominance boundary lictwccn the 
display device and the eye. This distance will 
also depend upon the image signal being 
rendered; low-information content images can be 
rendered ‘perfectly’ on low-information devices. 


A unique distance (hat docs not depend upon the 
image signal can be defined. Over alj possible 
images that can be rendered by the display device, 
the farthest distance found is the unique 
dominance distance. Tliis distance depends 
exclusively on the attributes of the device and 
observer and not upon the image signal content. 
As rendered images move closer to the observer, 
the rendering engine becomes the dominant factor 
in determining the quality of the rendered image, 

THE COST OF INFORMATION 
Information in an image is expensive lo gather, 
store, process, code, transmit, de-code, and 
reconstruct, Often the amount of imnge data 
communicated is more than can be. rendered on a 
soft copy device. For example, a 300 dpi 8 bit 
grayscale dye sublimation printer requires 
1200x3x1500 pixel values or 43,2 Mbits to 
render a 4x5 inch color image. If this image is 
rendered as soft copy on an 80 dpi screen with 8 
bits of grayscale, only 3.072 Mbits of 
information will be rendered on the screen. This 
is one fourteenth of Lbc data sent lo the printer. 
Chances arc very good that at a viewing distance 
of 1m or less the soft copy rendering will equal 
or exceed (he quality of die dye sub print. How 
many hits are really required to achieve good 
image quality relative to n standard? How many 
hits are required lo satisfy the indiscriminability 
test? Bits that cannot be seen are wasted, adding 
cost lo the rendering device and possibly other 
system components without improving 
perfonriancc. 

The amount of information in a static digital 
image is given by the Shannon formula 

I = log 2 (tf of locations X It of levels). 0) 

A more natural way to conceptualize image 
information relates the information value I to the 
size of the image rendered. Devices can be 
specified in terms of (he number of controllable 
locations per inch or dpi. In dpi units, the 
information measure is 

I = 21og 2 (dpi)+log 2 (sq.in.)+Iog 2 (lcvels), (2) 
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Dropping the area term converts the formula to 
on information density, I d , where 

I d sr 21og 3 (dpi) + Iog 2 (lcve)s). (3) 

] A is the sum of two components, spatial 
resolution and grayscale resolution. These arc 
known to tradeoff against each other [3] once a 
minimum spatial resolution lias been achieved, 

A popular printed advertisement [4] for an ink jet 
printer describing the image of woman in a 
bathing suit, slates, "At 300 dpi you see a lady in 
her bathing suit. At 720 dpi you see her bathing 
suit is wet. At 1440 dpi you see her bathing suit 
is painted on." As the resolution increases, more 
surfocc details emerge. At the lowest resolution, 
halftoning the image makes it look flat; at 
moderate resolution, sufficient surface detail is 
visible for surfaces to glisten. At the highest 
resolution, surface details suggesting depth of 
texture arc now visible supporting the advertising 
claim. 

Although the amount of information rendered by 
the printers in this example is dramatically 
changing, the signal information content is not 
necessarily changing at the same rate, Often 
dithering can be accomplished by spatially up- 
sampling the image signal without requiring 
additional image information al the interpolated 
locations. Because the amount of information 
required to render an image as soft versus hard 
copy can be different, image signals often contain 
more information than will be rendered on the 
soft copy device. When the signal is rendered as 
soft copy, the image signal must be 
downsampled, and this can produce visible 
artifacts if not done properly. Dithering trades 
spatial resolution for grayscale resolution. The 
cost to dither is that more display locations must 
be controlled in the rendered image. Another cost 
of dithering is its impact on image quality. The 
dither itself can become visible, adding a high 
spatial frequency noise component called ‘fixed 
pattern noise* to the rendered image, degrading 
image quality by producing a visible pattern tfi3t 
can mask details in the image in addition to 
adding an objectionable texture to the rendered 
image. 

IMAGE RENDERING COSTS 

There arc four elements that affect the cost of 

rendering information as soft or hand copy. 


These are: (1) the cost of the number of locations 
in the rendered image, (2) the cost of controlling 
the locations, (3) the cost of communicating of 
the image data, and (4) the cost of processing the 
image data. These factors are not independent. 
The number of pixels rendered is the number of 
locations. For soft copy rendering devices, 
increasing the number of pixels can result in 
more gates, a smaller fill factor, and less 
luminous efficiency. Controlling more pixels 
requires more row and column drivers and higher 
bandwidth connectivity from the signal source to 
the rendering device, Driver complexity can be 
reduced by temporal and spatial dithering of the 
image signal or by subsampling the image data. 
By simply downsampling the greyscale, a 2- to 
8-fold savings can be generated for many 
displays. This can result in simpler and cheaper 
drivers, lower bandwidth connectivity, and 
reduced EMJ, but at a cost of more image 
processing. It is often simpler to understand the 
technology trade-off costs in terms of the 
complexity of manufacturing, the cost of 
components and architectures, and the cost of 
added processing, than it is to understand the 
tradeoff cost in terms of image quality. Thus it 
is important to know the limitations that the 
human visual system imposes on this trade-off. 
As previously noted, such limitations arc less 
severe with increased viewing distance and lower 
brightness. 

HUMAN VISUAL SYSTEM 
Spatial Sampling Osterberg [5] in 1935 reported 
measurements of cone and rod photoreceptor 
densities of the human retina. The retina is the 
sensory mechanism that is sensitive to light and 
transduces it into a neural signal, Osterberg 
counted linear densities of approximately 120 
cones per degree visual angle in the fovea, the 
small retinal region of highest spatial acuity. 
More recently, Curcio and her collaborators (6-9] 
have measured the cone mosaic linear sampling 
densities and find individual variations from as 
low as 90 to as high as 190 cones per degree 
visual angle in the foveal region. Campbell and 
Guhisch f 10] have characterized the point-spread 
function of the eye’s optics. The blur 
measurements and cone density data taken 
together correspond well with the limiting acuity 
of the average eye of approximately 1 arc minute. 
They are also consistent with the variation of 
best corrected acuity in the population. 
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Measurements of the spatinl contrast sensitivity 
of the eye. however, show that there arc 
additional attenuation factors that impact the 
eye's ability to resolve details fit, 12]; these 
factors arc believed to he due to neural 
processing. 

The variation in cone sampling densities in the 
population implies that the dominance boundary 
defined earlier will not be the same for all 
viewers. Some individuals will require greater or 
lesser distances to satisfy an indiscriminability 
criterion for any given comparison. The 120 
samples per degree as a detector density norm is 
consistent with population acuity norms and 
ihcrcforc is a good figure-of-meril for evaluating 
display resolution requirements for the average 
observer. 

The viewing distance to the typical office soft or 
hard copy display device is approximately 0.5m. 
Al this distance, one inch subtends three degrees 
of visual angle. At 120 cone photoreceptors per 
degree, it would require 360 dpi to match one 
pixel to each cone photoreceptor in the foveai 
region. A 600 dpi laser printer is al a higher 
spatial resolution than the eye at standard 
viewing distances even for a high-acuity 
individual. One might believe that at 600 dpi the 
laser printer would render ‘perfect’ natural 
images. Black and while images that do not 
contain grayscale values between the min and 
max can be ‘perfectly’ rendered on these devices. 
The grayscale of a laser printer, however, is only 
one bit. Grayscale values between the max and 
min reflectances can only be created by dithering. 
The dither trade-off lowers effective spatial 
resolution to Ics6 Ilian 150 dpi, and because of 
the inherently high contrast of die pixels, the 
fixed pattern noise produced by dithering becomes 
quite visible at this distance. Remember that the 
spatial Fourier power spectrum of a point 
contains frequency energy over a broad range of 
spatial frequencies and therefore can lie detected 
by mechanisms of vision that arc tuned to any of 
these frequencies. Soft copy devices are not 
limited to one bit of grayscale, and con produce 
good-looking images at lower spatial resolutions 
than laser printers because of the added degrees of 
freedom in grayscale at each pixel location. 

Intensity Sampling The grayscale resolution of 
the eye is limited by two factors. Below some 


minimum signal level the visual system cannot 
resolve signal from noise. First, die eye, like 
any detector system, is limited by noise both 
internal and external. A very elegant experiment 
and theoretical analysis conducted over 50 years 
ago by Hccht, cl al. [13] revealed that human 
vision is remarkably immune to noise in die 
visual environment and is limited only by the 
quantum nature of light itself. For example, we 
would not be able to sec laser speckle, which can 
be quite random due to mode switching in the 
loser, if die eye were less noise immune. The 
noise limits of vision arc primarily due lo 
internal or neural noise. 

Secondly, the nervous system has a compressive 
saturating non-linearity: above 6omc contrast no 
additional internal signal is generated even 
diougli the input signal is increasing. The eye 
rapidly adapts, however, so it can be very 
difficult to measure this saturating limit, and 
thus this limitation of the human visual system 
is not a significant factor in grayscale resolution. 
There is an additional reason that die saturating 
nonlinearity is not 3 factor. The mosl frequently 
encountered viewing conditions and lighting for 
standard direct-view soft-copy displays all lie 
within a quite limited range. 

Office automation displays have a peak 
brightness in the range of 15 to 300 cd m‘ 3 . A 
typical display will have a peak brightness of 75 
cd m J . Ambient lighting in most offices is in o 
range of 5 to 75 cd m' 5 measured with a perfectly 
reflecting white screen, or approximately 100 to 
1000 Lux. Under these viewing conditions, the 
peak contrast of office devices is not sufficient to 
generate saturating signals in the visual pathways 
and (here is little yisually-significant adaptation. 

The signal-io-noise ratio of the visual system has 
been measured empirically in a variety of 
experimental settings. The classical 

measurements were made by Stiles [14]. Stiles 
used a chromatic adaptation methodology to 
isolate the cone pathways and then attempted to 
measure the signal-to-noisc ratio in each cone 
system. The measurement is made by viewing a 
steady field of fixed intensity, called the adapting 
field, and then measuring the intensity increment 
of a brief flash required to just sec (lie increment. 
The minimal increment that is visible is the 
minimal signal required to evoke a sensory 
experience; it is a measure of the signal-lo-noisc 
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ratio of the system. Jt is now understood that 
Stiles measured (he signal-to-noisc ratio at 
various stages within the cascade of neural 
processing taking place within the visual system 
[15,16], Nonetheless, his measurements arc 
useful measures of the limiting resolution of the 
eye. He found that for two of the cone systems 
and their associated primary pathways the 
minimal ratio was approximately 2 % and for die 
third cone system 8.7%. 

Above a minimal steady-field level, the ratio is 
independent of brightness That is to say, the 
incremental or decrement^ signal required to see 
n change is a fixed proportion of the adapting 
level [17] once the dc level is above die threshold 
level for the field. This is a property of sensory 
systems and is called Weber's Law [18]; the ratio 
is often referred to as the Wcbcr-Fcchncr fraction. 
We measured the Weber-Fechner fraction for soft 
copy displays [19] and found that, die red and 
green primaries of a typical rendering device have 
signal-to-noise ratios just under 2% and for the 
blue primary it is around 4%. 

The discrepancy between our finding and that of 
Stiles is due in part to the non-isolation 
condition we used to measure these ratios. 
Unlike Stiles, we made no attempt to isolate (he 
cone pathways, so our signal could have been 
sensed by any or all of the cone mechanisms and 
the subsequent mechanisms that process their 
signals. Since more than one mechanism can 
detect die increment, there arc multiple chances 
to detect it. Our results are consistent with 
simple probability summation over diesc 
pathways [20]. Since the spectral tuning of (he 
cone systems is highly overlapping, (he cones 
arc broadband detectors in (he spectral domain, 
and (he standard rendering-device RGB primaries 
produce highly correlated f ignals within the cone 
photoreceptors. 

We found in addition that below field levels of 
approximately 0,34 od m r} (0.1 fL), the Wcbcr- 
Fcchncr fractions were increasing. This is the 
brightness operating level at which sensitivity is 
dominated by absolute threshold of the cone 
mechanisms. At this level the field is not 
generating any appreciable signal, so only (he 
increment matters. For o display device this 
means that grayscale differences below this level 
ore very difficult to see, since all input signals 
below (his intensity level arc generating 


essentially the same internal event. Added 
grayscale steps below this value are wasted. 

ENOUGH INFORMATION 
At IDRC , 94 t we [3] measured the trade-off in 
grayscale and spatial resolution for dithered 
images empirically with real human observers 
and in theory using Q computational model of 
human vision. A schematic representation of 
Fig. 3 from that report is shown in Fig. 1. Fig. 
1 plots (he locus of points in a space defined by 
equation (3), grayscale resolution in log 2 1cvc)s 
versus spatial resolution in 21ogjdpi, (hat arc 
indiscriminate from a high-resolution rendered 
image. Jn the experiment we conducted [3], we 
measured the djscriminabilrty of a very high 
resolution rendered image to downsampled and 
dithered renderings of the same image signal. 
The rendering device we used in the empirical 
pari of our experiment had a peak luminance of 
25 fL and a measured contrast of 50:1. 
Additional details of the experiment can be found 
in [3], The trade-off locus fell along a curve that 
could be described as two line segments meeting 
at. a common point or elbow. This locus is 
represented as two solid line segment* in Fig. 1. 



2Iog2dpi 


Figure 1. Schematic representation of trade-off 
locus shown in Fig. 3 of [3], 

All the points on this locus arc downsampled and 
dithered renderings of a zone plate [21] and they 
are indiscriminate from a very high resolution 
rendering of this image signal. The high 
resolution image can be thought of as a point 
representing some large number of gray levels 
(possibly continuous) and some very large 
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number of dpi (possibly continuous as well). In 
Fig. 1 the high resolution point is in (he upper 
right-hand comer. Our results indicated that 
below the elbow the line segment had a slope 
close to -1 . One bit of grayscale was equivalent 
lo one bit of spatial resolution in our empirical 
data [3]. This locus is labeled as ‘E’ for 
'Equivalent’ in the figure. The elbow is indicated 
06 on open circle in Fig. 1 and was empirically 
found to be approximately at 140 dpi and 8 levels 
of grayscale (i.e. the point <)4.4,3> in ibis 
information space) (see 3. Fig. 3], 

Note that all of the points to the right and above 
this locus of indiscriminate rendered images 
require more information to render them. If the 
slope of (he lower segment is -I (ns it was 
approximately found to be in the results of the 
empirical and computational experiment [3]), 
then the information content of the point at the 
elbow and all points below it on the 
indlscriminobility locus are the lowest possible 
bits required to be indiscriminate from a 
continuous rendering of the image. In this ease 
we call the elbow the ‘Superior Information 
Nodal Point’ or Superior Nodal Point for point 
of least information content. The point where 
this line intersects the horizontal line defined by 
bits = 1 (the smallest possible number of 
quantized grayscale levels) is called the 'Lesser 
Nodal Point,’ For the ease where the slope 16-1. 
these two Nodal Points correspond to the same 
amount of information rendered. 

The information content of the Superior Nodal 
Point is the least information required to be 
indiscriminablc from a continuous image when 
the data follow a locus with slope greater than -I . 
This possible locus is shown ns the dashed line 
segment labeled ‘M’ for More’ in Fig. 1 . Here 
the Lesser Nodal Point requires more information 
to render. Finally, if the locus of points below 
the elbow has a slope of less Ilian -I, then the 
point of least information content is below the 
elbow and to the left of the line of slope - I. In 
this ease the point of least information would tie 
the Lesser Nodal Point. A line segment 
representing this condition is shown in Fig. 1 as 
a dashed line labeled ‘L’ for 'Less’ information. 


The Lesser Nodal Point for this case is indicated 
at the solid black circle at (he intersection of this 
line segment and I bit. 

We found in the results of both our 
computational and empirical experiments on the 
grayscale resolution trade-off that die Lesser 
Nodal Point was at 300 dpi. At first this would 
seem wrong since natural images rendered on 
300+ dpi laser printers certainly do not look like 
continuous im3gc6. But there is an important 
difference. Our experiments were done with 
images that had a 100% spatial fill factor. Laser 
printers do not have this properly. The 
sharpening of the dots in (he laser printing 
process tends to put more information into lower 
spatial frequency bands, making the dots more 
visible by providing an input to the spatial 
mechanisms of vision Ui3l arc tuned to lower 
spatial frequencies. The peak tuning spatial 
frequency of human vision is around 2 cycles per 
degrcc[ll]. The dots used in our experiment tod 
100% fill; this acts like a bandlimiling filter 
reducing the alaising energy present in smaller 
dots like those used in loser printing. 

Combining the findings in [3] and [19] we can 
calculate the number of bits to look 'good 
enough’ relative to a very high resolution 
standard zone plate. In the case of image rendered 
at 25 fL peak brightness with 50:1 contrast or 
less a information density of 14.4+3 or 17.4 hits 
is enough. To optimize (he image quality, the 
steps should be geometrically spaced [19]. 

CONCLUSION 

Tire number of bits required to render a perfect 
image depends upon the human visual system 
and the rendering engine. There is a dominance 
boundary at which one of these two systems 
determine image quality. Finding this boundary 
and measuring the Nodal Point defines the 
minimal information required to render images 
that arc indiscriminablc from an ideal image. 
These dominance boundaries arc near the dpi rates 
currently being produced by LCD manufacturers 
and used in laptop computers and other 
information appliance devices. 
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